ci: add explicit top-level permissions to all workflows (least privilege) by entlein · Pull Request #39 · k8sstormcenter/node-agent

entlein · 2026-05-04T17:59:05Z

Why

Scorecard / GitHub code-scanning has 16 open Token-Permissions alerts on this repo (security_severity: high). Most are "no topLevel permission defined" — every workflow file is implicitly using GITHUB_TOKEN's broad default scope.

What

Adds permissions: read-all at the top of every workflow that was missing one (9 files).
Existing job-level permissions blocks (build.yaml's build + trigger-component-tests, component-tests.yaml's build-and-push-image, sign-object.yaml top-level, pr-created.yaml, pr-merged.yaml, incluster-comp-*.yaml) are unchanged — they already grant exactly what those jobs need.
Adds explicit job-level permissions to bypass.yaml's pr-merged caller. That job invokes incluster-comp-pr-merged.yaml as a reusable workflow; reusable workflows can't request scopes the caller hasn't granted, so top-level read-all + no-perms on the caller would starve the inner docker-build / create-release-and-retag jobs. The grants mirror pr-merged.yaml's pr-merged job exactly: id-token: write, packages: write, contents: write.

Coverage

Closes 9 of 16 open Token-Permissions alerts ("no topLevel permission defined" type).

The remaining 7 are "jobLevel '' permission set to write" findings that flag legitimate write uses:

build.yaml:87 — actions: write for trigger-component-tests (cross-workflow dispatch)
pr-created.yaml:18 + incluster-comp-pr-created.yaml:40 — security-events: write (CodeQL upload)
pr-merged.yaml:33,34 + incluster-comp-pr-merged.yaml:355 — packages: write / contents: write (image push, release retag)
sign-object.yaml:26 — top-level packages: write (image push)

These are required for the workflow's actual job and aren't removable without breaking it.

Risk

Low. No-op for any job whose existing permissions are correct (which is most of them). The one substantive change is the bypass.yaml permission grant, which mirrors the corresponding pr-merged.yaml grant verbatim — it's bringing bypass.yaml's reusable invocation in line with the explicit pattern already used elsewhere.

YAML validated: all 10 workflow files parse cleanly with yaml.safe_load.

Test plan

CI green on this PR (no privilege-related failures)
bypass.yaml workflow_dispatch run succeeds end-to-end
gh api repos/k8sstormcenter/node-agent/code-scanning/alerts?state=open count drops by 9 after merge

…ege) Scorecard flagged 9 'no topLevel permission defined' alerts (Token- Permissions, security_severity: high). GITHUB_TOKEN's default scope includes write access to most things; opting into 'permissions: read-all' at the workflow level scopes that down to read, then jobs that genuinely need write scopes override locally. Existing job-level permissions blocks (build / sign-object / pr-merged / pr-created / incluster-* / docker-build) are unchanged. Added job-level permissions to bypass.yaml's pr-merged caller — needed because that job invokes incluster-comp-pr-merged.yaml as a reusable workflow. Reusable workflows can't request scopes the caller hasn't granted, so top-level read-all + the implicit no-perms on the caller would starve the called workflow's docker-build / create-release jobs. Mirrors pr-merged.yaml's perms exactly. Closes 9 of 16 open Token-Permissions alerts. The remaining 7 are 'jobLevel X permission set to write' findings that flag legitimate write uses (image push, sigstore signing, security-events upload) — Scorecard wants those audit-acknowledged but they're not removable without breaking the workflow's purpose.

coderabbitai · 2026-05-04T17:59:19Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: cab182ee-5ddd-4636-9348-76b1b6ed0646

📥 Commits

Reviewing files that changed from the base of the PR and between 80859ac and 47b33c5.

📒 Files selected for processing (2)

.github/workflows/benchmark.yaml
.github/workflows/go-basic-tests.yaml

📝 Walkthrough

Walkthrough

Adds a top-level permissions: read-all block to multiple GitHub Actions workflows to set a least-privilege default; several jobs retain or add per-job permissions overrides where elevated scopes are required (e.g., contents, issues, pull-requests, id-token, packages, security-events).

Changes

GitHub Actions workflow permission hardening

Layer / File(s)	Summary
Workflow-Level Permission Defaults `.github/workflows/benchmark.yaml`, `.github/workflows/build.yaml`, `.github/workflows/component-tests.yaml`, `.github/workflows/go-basic-tests.yaml`, `.github/workflows/incluster-comp-pr-created.yaml`, `.github/workflows/incluster-comp-pr-merged.yaml`, `.github/workflows/pr-created.yaml`, `.github/workflows/pr-merged.yaml`	Each workflow now declares `permissions: read-all` at the top level with comments indicating least-privilege defaults.
Job-Level Permission Overrides `.github/workflows/bypass.yaml`, `.github/workflows/benchmark.yaml`, `.github/workflows/go-basic-tests.yaml`, `.github/workflows/incluster-comp-pr-created.yaml`	Specific jobs add or retain per-job `permissions` blocks that grant elevated scopes required by those jobs: `id-token: write`, `packages: write`, `contents: write/read`, `issues: write`, `pull-requests: write`, `security-events: write`.
Comments / Rationale Inline (various workflow files)	Explanatory comments added near `permissions` entries describing least-privilege defaults and that jobs should override when elevated scopes are necessary.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: adding explicit top-level permissions with a least-privilege approach across all workflows.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, explaining the security motivation, detailed implementation across 9-10 files, coverage metrics, risk assessment, and test plan.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chore/token-permissions-least-privilege

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/benchmark.yaml:
- Around line 27-29: The workflow-level permissions were set to read-all which
removes write access for the benchmark job; in the `benchmark` job definition
add a job-level permissions override that grants `issues: write` and
`pull-requests: write` so the `Comment on PR` step has the required write scope
to post benchmark results back to the PR thread.

In @.github/workflows/go-basic-tests.yaml:
- Around line 39-41: The workflow-level permissions are too restrictive for
CodeQL upload; in the reusable workflow add a job-level permissions override for
the CodeQL job (Environment-Test) to restore upload rights by setting
permissions: contents: read and security-events: write for the job that runs
github/codeql-action/analyze, and update any callers of this workflow_call to
grant security-events: write when invoking the reusable workflow so uploads
succeed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 9ff4a244-c1cc-4a4e-a60f-58eb835e5a7e

📥 Commits

Reviewing files that changed from the base of the PR and between f0d0f72 and 80859ac.

📒 Files selected for processing (9)

.github/workflows/benchmark.yaml
.github/workflows/build.yaml
.github/workflows/bypass.yaml
.github/workflows/component-tests.yaml
.github/workflows/go-basic-tests.yaml
.github/workflows/incluster-comp-pr-created.yaml
.github/workflows/incluster-comp-pr-merged.yaml
.github/workflows/pr-created.yaml
.github/workflows/pr-merged.yaml

Addresses CodeRabbit's two findings on PR #39: 1. benchmark.yaml:29 (Minor) — peter-evans/create-or-update-comment needs issues:write + pull-requests:write to post benchmark results to the PR thread. Failure was masked by continue-on-error on the step. Add job-level permissions on 'benchmark'. 2. go-basic-tests.yaml:41 (Major) — github/codeql-action/analyze needs security-events:write to upload SARIF results. Failure was masked by continue-on-error on the CodeQL steps. Add job-level permissions on 'Environment-Test'. Note for #2: this file is invoked as a reusable workflow (workflow_call) from pr-created.yaml's pr-created job, which already grants security-events:write — so the caller envelope is already correct, only the inner job's override was missing. YAML validated.

Chore PR #39 added 'permissions: read-all' to every workflow file including the workflow_call reusables. That broke the reusable-call contract: a called workflow's top-level permissions are the FLOOR the caller must grant. Callers of these reusables (pr-created.yaml's pr-created job, pr-merged.yaml's pr-merged job, etc.) only grant a narrow set of scopes (id-token:write, packages:write, security-events: write, pull-requests:write, contents). They do NOT grant the full read-all set (actions:read, artifact-metadata:read, attestations:read, checks:read, deployments:read, discussions:read, issues:read, models:read, vulnerability-alerts:read, packages:read, pages:read, repository-projects:read, statuses:read, id-token:read). Result: every PR opened on this fork since chore #39 merged (May 6) has had pr-created.yaml startup_failure with: Error calling workflow ... incluster-comp-pr-created.yaml@<sha>. The workflow is requesting 'actions: read, ...', but is only allowed [...] Fix: strip top-level 'permissions: read-all' from the four reusables: - benchmark.yaml (also direct-trigger; falls back to per-job perms) - go-basic-tests.yaml - incluster-comp-pr-created.yaml - incluster-comp-pr-merged.yaml Each reusable's per-job 'permissions:' blocks remain intact and correctly request only what the job needs. Callers' existing grants are sufficient. Top-level 'permissions: read-all' stays on the OUTERMOST workflows (pr-created.yaml, pr-merged.yaml, build.yaml, bypass.yaml, component-tests.yaml, sign-object.yaml, scorecard.yml) — they're event-triggered, not workflow_call'd, so the read-all default still hardens GITHUB_TOKEN there.

* test(component): port Test_28 to upstream's unified user-overlay label Upstream PR kubescape#788 (Replace AP and NN cache with CP) collapsed the two legacy workload-keyed caches into a single ContainerProfileCache that reads ONE pod label `kubescape.io/user-defined-profile=<name>` and uses <name> as the lookup key for BOTH the user ApplicationProfile and the user NetworkNeighborhood. The fork's earlier two-label scheme (`user-defined-profile` for AP + separate `user-defined-network` for NN, with potentially different names) is no longer honored — the second label is silently ignored. Port: - tests/resources/nginx-user-defined-deployment.yaml: drop the `user-defined-network` label, point the surviving label at one shared name `curl-28-overlay`. - tests/component_test.go Test_28_UserDefinedNetworkNeighborhood: create both AP and NN under that single shared name. Assertions unchanged — the test still verifies that the user NN's egress restriction (only fusioncore.ai allowed on TCP/80) is enforced once the pod is running. Verified locally: go vet -tags=component ./tests/... clean; go test -tags=component -run='^$' ./tests/... compiles cleanly. * feat(cel): re-port CompareExecArgs hookup onto upstream's CP cache PR #35's wildcard-aware exec arg matching needs reapplication on top of the new ContainerProfileCache (upstream kubescape#788) baseline. The original PR sat on the legacy applicationprofilecache, which has been deleted; the call site now reads cp.Spec.Execs from a ContainerProfile. Same semantic change as PR #35: '⋯' (DynamicIdentifier) — matches exactly one argument position. '*' (WildcardIdentifier) — matches zero or more consecutive args. Wiring: - pkg/rulemanager/cel/libraries/applicationprofile/exec.go: drop slices.Compare exact-equality on the cp.Spec.Execs loop; route through dynamicpathdetector.CompareExecArgs. - go.mod: bump fork storage replace to feat/exec-arg-wildcards tip (3fc287210729) which carries the matcher. - exec_test.go: re-add TestExecWithArgsWildcardInProfile (13 subtests across curl --user ⋯, sh -c *, ls -l ⋯, echo hello *, plus negative literal-anchor / under-consumed-⋯ / mid-profile-* cases). Mirrors the test set that lived on PR #35 before the upstream merge. Verified: full applicationprofile package green (`go test ./pkg/rulemanager/cel/libraries/applicationprofile/`). * feat(rules): R0040 'Unexpected process arguments' + Test_32 e2e R0040 is an additive companion to R0001. It evaluates: ap.was_executed(...) && !ap.was_executed_with_args(..., event.args) so it ONLY fires when the exec'd path IS in the user-defined profile (R0001 stays silent) but the runtime arg vector does not match any profile entry's pattern. With wildcard tokens supported by dynamicpathdetector.CompareExecArgs: '⋯' (DynamicIdentifier) — exactly one argument position. '*' (WildcardIdentifier) — zero or more consecutive args. Use case: profile entry {Path: /bin/sh, Args: [sh, -c, *]} flags 'sh -x ...' as drift while permitting 'sh -c <anything>'. Wiring: - tests/chart/templates/node-agent/default-rules.yaml: new R0040 CEL rule definition immediately after R0001, same MITRE tagging (TA0002/T1059) and same applicationprofile-anomaly tag set. - tests/chart/templates/node-agent/default-rule-binding.yaml: R0040 added to the all-rules-all-pods binding next to R0001. - tests/resources/curl-exec-arg-wildcards-deployment.yaml: new fixture, curl pod labelled with the unified kubescape.io/user-defined-profile=curl-32-overlay label. - tests/component_test.go: Test_32_UnexpectedProcessArguments with 4 subtests: 32a sh_dash_c_matches_wildcard_trailing — sh -c <cmd> matches profile [sh, -c, *] — R0040 silent. 32b sh_dash_x_mismatches_R0040 — sh -x <cmd> mismatches the literal -c anchor — R0040 fires. 32c echo_hello_matches_wildcard_trailing — echo hello world matches [echo, hello, *] — R0040 silent. 32d echo_goodbye_mismatches_R0040 — echo goodbye world mismatches the literal hello anchor — R0040 fires. Verified locally: go vet -tags=component ./tests/... clean; go test -tags=component -run='^$' ./tests/... compiles cleanly. End-to-end alert assertions run in CI. * deps(storage): bump to rebased feat/exec-arg-wildcards tip (0de34ebc) After rebasing storage feat/exec-arg-wildcards onto storage main, the matcher branch now sits on top of the latest fork main commit (352395a3 — Internal-field merge fix). Bump the node-agent storage replace to that new pseudo-version so this branch's tests run against storage main + matcher in one consistent baseline. Verified locally: 47/47 non-eBPF unit packages green; vet clean; the applicationprofile CEL package's TestExecWithArgsWildcardInProfile is 13/13 green; component-tests compile under the component tag. The two failing packages (pkg/containerwatcher/v2/tracers and pkg/validator) fail with the same pre-existing /sys/fs/bpf mount-permission error they have on every recent run — env, not code. * ci(component-tests): add Test_32_UnexpectedProcessArguments to matrix The component-tests workflow uses a hardcoded matrix list, not a dynamic discovery from the test source. Test_32 (added in a613cf6) must be listed explicitly to be picked up — without this entry the test is silently skipped. * fix(containerprofilecache): re-wire R1016 tamper alert + expand Test_31 Upstream PR kubescape#788 (Replace AP and NN cache with CP) deleted the legacy applicationprofilecache where the fork's emitTamperAlert (commit c2d681e 'Feat/tamperalert' #22) lived. After the merge, R1016 alerts no longer fired for tampered user-defined profiles, breaking Test_31_TamperDetectionAlert (passed 3/3 on main, fails on the merged branch — confirmed regression introduced by PR #36). This restores the contract: every cache load of a user-supplied ApplicationProfile or NetworkNeighborhood overlay re-verifies the signature, and emits an R1016 'Signed profile tampered' alert through the rule-alert exporter when the signature is present but no longer valid. Alert shape preserved from the legacy cache so dashboards and component tests keep matching. Implementation: - new file pkg/objectcache/containerprofilecache/tamper_alert.go: verifyUserApplicationProfile / verifyUserNetworkNeighborhood / emitTamperAlert / extractWlidFromContainerID. Self-contained; keeps containerprofilecache.go diff small. - containerprofilecache.go: new tamperAlertExporter field + SetTamperAlertExporter setter + verify hooks immediately after GetApplicationProfile / GetNetworkNeighborhood succeed in the user-overlay branch of addContainer. - cmd/main.go: wire the alert exporter via SetTamperAlertExporter after the cache constructor (kept the constructor signature unchanged to avoid blast radius on tests). The setter is nil-safe: when no exporter is wired, verification still runs and is logged but no alert is emitted — matches the legacy behavior for unit-tests-with-no-exporter. Test_31 expanded from one scenario to four subtests, each in its own namespace to avoid alert cross-contamination: 31a tampered_user_defined_AP_fires_R1016 — original regression case 31b untampered_signed_AP_no_R1016 — negative: clean signature 31c unsigned_AP_no_R1016 — signing is opt-in 31d tampered_user_defined_NN_fires_R1016 — parallel NN code path Verified locally: - go build ./... clean - go test ./pkg/objectcache/containerprofilecache/... green - go test ./pkg/signature/... green - go vet -tags=component ./tests/... clean - go test -tags=component -run='^$' ./tests/... compiles * test(component): Test_32 profile uses full-path argv[0] Empirical finding from CI run 25178930763 — Test_32's positive subtests (32a sh_dash_c_matches, 32c echo_hello_matches) fired R0040 when they should not. Cause: at runtime, the eBPF tracer captures argv[0] as the FULL exec path (e.g. "/bin/sh") rather than the basename ("sh"). My profile entries used basenames, so the matcher's first-position literal compare missed and the cache fell through to 'no exec entry matches' — R0040 fires. Aligns Test_32's profile with the convention already used by Test_27's wildcard_yaml_profile_allowed_opens fixture (known-application-profile-wildcards.yaml predecessor): argv[0] is the full path, subsequent positions are flags/values. Subtest expectations after this fix: 32a sh -c <cmd> → matches [/bin/sh, -c, *] → R0040 silent 32b sh -x <cmd> → -c anchor mismatch → R0040 fires 32c echo hello <…> → matches [/bin/echo, hello, *]→ R0040 silent 32d echo goodbye <…> → hello anchor mismatch → R0040 fires * test: AP-fixture linter (R-AP-* rules) + canonical reference profile Catches the class of bug Test_32 hit on its first CI run (PR #37 run 25178930763): profile entries used basename argv[0] ("sh") while the eBPF tracer captures the full path ("/bin/sh"), so the matcher silently misses and the rule fires when it shouldn't. Without a linter, this kind of fixture drift only surfaces in a 15-minute component-test run on a kind cluster — too late, too expensive. The linter (LintApplicationProfile / LintApplicationProfileYAML in tests/resources/aplint_test.go) is intentionally written as a pure function returning []Violation. Zero testing-package coupling on the hot path so it can be lifted into a future bobctl subcommand `bobctl lint <ap.yaml>` without rewrite — see backlog at ~/biz/sbob-business-plan/state.yaml. Rules: R-AP-01 — kind must be ApplicationProfile R-AP-02 — at least one container R-AP-03 — container.name non-empty R-AP-10 — exec.path absolute (catches relative paths) R-AP-11 — exec.path no wildcards (binary identity is exact) R-AP-12 — exec.args[0] equals exec.path or wildcard token (Test_32-style argv[0] basename trap) R-AP-13 — exec.args wildcard tokens are whole-word (no embedding) R-AP-20 — open.path non-empty + absolute R-AP-21 — open.flags non-empty (real auto-recorded opens always have ≥1) R-AP-22 — open.flags from known O_* set (catches typos) Each rule has a dedicated self-test that constructs a minimal-bad YAML and asserts the rule fires (5 negative tests). One positive test (TestLinter_canonical_AP_passes) parses the fork's reference known-application-profile.yaml — extracted from a real auto-recorded AP for curlimages/curl:8.5.0 in fea3b06 — and asserts zero violations. The reference YAML is restored to tests/resources/ so the canonical shape is in-tree and visible to humans + CI. Why a Go test rather than a shell linter: keeps the rule set in the same language as the storage matcher (`dynamicpathdetector`), so extending CompareExecArgs and the linter together stays cheap. Local-cluster organic learning was the original plan but k3s on OrbStack is currently flapping (LXC-related boot loop). The fea3b06 profile was extracted from real auto-learning at an earlier moment of stability, which is the next-best ground truth. * fix(tamper_alert): accept self-signed profiles, only flag actual tamper Switch verifyUser{ApplicationProfile,NetworkNeighborhood} from strict VerifyObject to VerifyObjectAllowUntrusted. The strict variant requires a Sigstore Fulcio trust chain and rejects locally-signed profiles even when the signature against the embedded cert is valid. That made Test_31b 'untampered_signed_AP_no_R1016' fire R1016 against an untampered AP, and broke Test_30's 'tampered_profile_loaded_without_ enforcement' subtest the same way. The intent is: tamper detection, not trust-chain enforcement. Matches cmd/sign-object/main.go's default verifier. * test(component): make Test_30 30b deterministic by re-execing inside Eventually The single-shot wget exec before Eventually was racy: if the eBPF event landed before the CP cache projected the user-defined AP, the rule manager evaluated against an empty baseline and R0001 never fired within the 60s polling window. Same race Test_29 already documents. Drive the wget exec inside the Eventually loop (10s tick, 120s deadline) so cache-load latency is absorbed by retries. Filter R0001 to comm=wget to make the assertion specific instead of catching any R0001. Drops the blind 15s pre-sleep and the redundant settle-then-recount block. * deps(storage): bump replace to f44fed80 (analyzer trailing-* fix) Picks up the upstream-PR-kubescape#316 review fix: trailing WildcardIdentifier now requires at least one regular-path segment, matching standard glob semantics. Closes the R0002 blind spot where '/etc/*' would silently match the bare '/etc' directory. * deps(storage): bump replace to 4ab95fb8 (PR #25 merged on fork main) Pulls in the full PR-kubescape#316 review fix set that just landed on storage main: proper splitPath-based trailing-* anchoring, DefaultCollapseConfigs() defensive-copy accessor, FindConfigForPath value-return, splitEndpoint defensive guard, plus the BenchmarkCompareDynamic baseline. * test(component): Test_33_AnalyzeOpensWildcardAnchoring End-to-end pin of the storage-side CompareDynamic contract through R0002. Each subtest deploys a fresh nginx pod with a user-defined AP carrying ONE Opens entry, then `cat`s a target path that probes a boundary case from the storage analyzer fixes (kubescape/storage kubescape#316 review by matthyx + entlein): - Anchored trailing `*` matches one OR MORE remaining segments — never zero. So /etc/* matches /etc/passwd but NOT bare /etc. - DynamicIdentifier (⋯) consumes EXACTLY ONE segment. - Mid-path `*` is zero-or-more, so /etc/*/* matches /etc/ssh (inner * consumes zero, trailing * consumes one). - Mixed ⋯/* combinations: ⋯ pins one, * consumes the rest. - splitPath normalises trailing slashes on both sides. 11 subtests covering: trailing_star_matches_immediate_child — basic /etc/* match trailing_star_matches_deep_child — multi-segment under prefix trailing_star_does_not_match_bare_parent — the security fix deep_prefix_trailing_star_does_not_match_parent — same rule, deeper ellipsis_pin_one_segment_then_literal — ⋯ rejects zero ellipsis_then_trailing_star_matches_two_* — ⋯/* combo, 2 levels ellipsis_then_trailing_star_matches_three_* — ⋯/* combo, 3 levels double_trailing_matches_one_child — /*/* mid-zero double_trailing_matches_deep_child — /*/* mid-one double_trailing_does_not_match_parent — /*/* needs ≥1 child trailing_slash_in_profile_normalises_to_literal — splitPath on profile Pinned at component level on TOP of the unit suite in storage/pkg/registry/file/dynamicpathdetector/tests/coverage_test.go. Both layers must agree — a drift in either lights up R0002 with a false positive or false negative. Matrix entry added to component-tests.yaml so the test runs in CI. * test(component): rework Test_33 negative cases to probe under R0002's monitored prefix R0002's CEL ruleExpression has a strict path-prefix filter: event.path.startsWith('/etc/') || event.path.startsWith('/var/log/') || ... All with trailing slash. Bare /etc and /var/log don't pass the filter, so R0002 never evaluates on those events — the matcher's bare-parent anchoring contract stays invisible at runtime even though the storage unit tests pin it. Probe one level deeper instead: /etc/ssl IS under the /etc/ monitored prefix, so the rule CAN see whether a /etc/ssl/* profile entry matches the bare /etc/ssl parent. Same security guarantee, observable layer. Reworked subtests: - trailing_star_does_not_match_bare_parent_under_monitored_prefix: profile /etc/ssl/*, cat /etc/ssl → R0002 fires - deep_prefix_trailing_star_does_not_match_parent: profile /etc/ssl/certs/*, cat /etc/ssl/certs → R0002 fires - ellipsis_requires_one_segment_not_zero: profile /etc/passwd/⋯, cat /etc/passwd → ⋯ requires one more segment - double_trailing_does_not_match_parent_under_monitored_prefix: profile /etc/ssl/*/*, cat /etc/ssl → R0002 fires The 7 positive subtests that already passed are untouched. Added a comment block documenting why we probe at /etc/ssl rather than /etc. * test(component): fix Test_28 + Test_31 31b flakiness Two distinct fixes for what looked like the same intermittent failures across PR #37 runs: Test_31 31b 'untampered_signed_AP_no_R1016' — root cause: storage's PreSave runs DeflateSortString on Syscalls (and Capabilities, Architectures), which sorts + dedupes. The signSignedAP helper signed the AP BEFORE pushing, against unsorted syscalls {socket, connect, read, write, close, openat}. After PreSave the stored AP had sorted {close, connect, openat, read, socket, write}, so the content hash differed from the signature → server-side verify correctly failed → R1016 fired even though the profile was untampered. Test_29 + Test_30 30b had the same fixture but didn't observe the bug because they only assert R0001 counts, never R1016. Pre-sort the syscalls in all three test fixtures so storage's normalization is a no-op on round-trip. Test_28 28a 'allowed_fusioncore_no_alert' — root cause: 15s post-deploy sleep wasn't always enough for the upstream ContainerProfileCache to project the user-defined NN. Failure mode is alert payload `profileMetadata.errorMessage:"waiting for profile update"` — the rule manager evaluated against an unloaded NN and fired R0005/R0011 spuriously. Bumped to 30s with a comment documenting why. A real fix would poll a cache-loaded signal but no such signal is exposed from outside the node-agent today. * test(component): sign-after-roundtrip in Test_31 to defeat content-drift R1016 false positive Test_31 31b 'untampered_signed_AP_no_R1016' kept flaking because the AP's content hash drifted between client-side sign and server-side verify across the K8s/storage roundtrip. Sources of drift include storage's PreSave normalisation (DeflateSortString, DeflateStringer, DeflateRulePolicies), signature/profiles GetContent's nil→empty-map mutation on PolicyByRuleId, and any K8s server-side defaulting of spec/metadata fields. Pre-sorting Syscalls in the previous fix only covered one of these. Sign-after-roundtrip closes the whole class: 1. Push the AP UNSIGNED to storage. PreSave runs, normalises content. 2. Read it back — this is what node-agent will see at verify time. 3. Sign THAT normalised content. 4. Delete the unsigned in-storage copy so deployAndWait can Create the signed version without an AlreadyExists conflict. 5. Strip server-managed metadata (resourceVersion / uid / etc.) from the returned AP so the second Create succeeds cleanly. Second push goes through deflate again. Idempotent on already-normalised content → stored bytes identical to signed bytes → content hash matches → verify succeeds → no R1016 false positive. Tampered subtests (31a, 31d) keep working: signSignedAP returns a known-good signed AP, the test mutates it post-helper, deployAndWait Creates the mutated version, storage round-trip preserves the mutation, and verify correctly detects the divergence. * test(component): bump Test_33 WaitForReady to 180s for cluster-pressure tolerance Test_33 deploys 11 fresh pods sequentially, one per subtest. Later subtests race against an increasingly loaded kind cluster — CP cache reconciler, alertmanager, prometheus all chew CPU at boot. 80s WaitForReady deadline timed out on the post-23ea224 run with 'workload not ready in ns ...' for early subtests once the cluster got busy. 180s gives headroom without changing total runtime regime. * deps(storage): bump replace to 43795bb4 (storage feat/exec-arg-wildcards tip — CRDs + CompareExecArgs) * test(aplint): drop redundant p := p loop var (Go 1.22+, copyloopvar lint) CodeRabbit (PR #38). Loop variable shadowing is no longer required since Go 1.22 made each iteration capture its own variable. The shadow trips golangci-lint's copyloopvar check. * fix(tamper_alert): R1016 dedup + use real WLID CodeRabbit (PR #38) flagged two issues on the refresh-loop tamper path: 1. Dedup R1016 (Major). The cache refresh re-runs verifyUser*() on every reconcile cycle, so a long-lived tampered profile would emit R1016 on every cycle and once per container reference. Track emitted alerts in a sync.Map keyed on (kind|ns/name@resourceVersion). LoadOrStore gives atomic 'first transition to invalid' semantics; a re-tamper at a new RV uses a fresh key and alerts again. Verification passing clears the key so future tampers re-alert. 2. Pass real WLID (Major). emitTamperAlert previously called extractWlidFromContainerID(containerID) — but containerID is a runtime ID like 'containerd://...' which GenericRuleFailure.SetWorkloadDetails parses as a WLID and silently drops kind/name/cluster attribution. Thread sharedData.Wlid (already in scope at the call site in containerprofilecache.go) through verifyUser*() into emitTamperAlert. Drop extractWlidFromContainerID — no longer needed. Test_31_TamperDetectionAlert exercises this path end-to-end; expecting it to keep passing (one alert per tampered profile, with proper workload attribution in Alertmanager). * deps(storage): bump replace to b0d68d3d (empty-Args wildcard match) Picks up storage's CompareExecArgs fix for path-only Execs entries (CodeRabbit PR #38, finding #3). With this bump, R0040 no longer fires on every legit exec of a path that the user-defined profile allowed but didn't constrain by argv. * fix: address CodeRabbit second-review batch on PR #38 Three CR concerns addressed in one commit (all submitted 2026-05-04 10:09): 1. tamper_alert.go (Major) — only emit R1016 on actual signature mismatch. Previously every error from VerifyObjectAllowUntrusted (hash computation, verifier construction, malformed annotations, signature mismatch) was treated as a tamper event. With EnableSignatureVerification=true that meant a transient operational failure could drop a valid overlay AND emit a false R1016. Fix: - Add signature.ErrSignatureMismatch sentinel in pkg/signature/annotations.go - Wrap the signature-fail return in verify.go with the sentinel (Go 1.20+ multiple-%w form) - Classify in tamper_alert.go via errors.Is(err, ErrSignatureMismatch); operational errors log with a 'NOT tamper' tag, do NOT touch the tamperEmitted dedup map, and do NOT emit R1016. They still honour strict-mode (return !EnableSignatureVerification) so behaviour is conservative. Adds tamper_alert_test.go: pins LoadOrStore semantics, confirms each operational-error variant returns false on errors.Is(ErrSignatureMismatch), smoke-tests the sentinel value. 2. aplint_test.go:221 (Minor) — replace strings.Contains substring check for fixture-type detection with proper YAML-header parse. The substring form silently skipped fixtures with quoted/non-canonical 'kind' values AND would false-positive on nested OwnerReferences with the same substring. Now uses sigs.k8s.io/yaml (already imported) to unmarshal into a {Kind string} struct and branch on the parsed value. 3. aplint_test.go R-AP-12 (Major, 'Heavy lift') — REJECTED with reasoning to be posted as CR reply. Evidence shows R-AP-12 enforces the actual runtime contract: - pkg/rulemanager/cel/libraries/parse/parse.go's getExecPath returns args[0] (the full binary path) - ap.was_executed_with_args(containerID, parse.get_exec_path(event.args, event.comm), event.args) passes the FULL argv to the matcher - Integration test pkg/rulemanager/cel/libraries/applicationprofile/ integration_test.go uses ["/bin/bash", "-c", ...] — full argv - Real fixture tests/resources/known-application-profile.yaml uses args: ["/bin/sleep", "infinity"] with args[0] == path The compare_exec_args_test.go cases that look like flags-only ([-s, ⋯]) are matcher-isolation unit tests, not the wired contract. Removing R-AP-12 would let users author flags-only profiles that silently fail to silence R0040 in production. Bumps storage replace to a7e6234349ab (storage main HEAD post-PR #27). * fix: address CodeRabbit third-review batch on PR #38 (0cf4a50) Three new findings on the May 9 review batch: 1. aplint_test.go R-AP-13 (Major) — embedded * was not flagged, only embedded ⋯. An arg like "foo*bar" linted clean, defeating the purpose of the wildcard-must-be-standalone rule. Added a parallel check for embedded wildcardIdentifier mirroring the existing dynamicIdentifier check. 2. aplint_test.go canonical-fixture skip path (Major) — t.Skipf on missing reference fixture silently disabled the gold-standard test if someone deleted/renamed it. Switched to t.Fatalf with a message that explains the file's role. Also switched the YAML-parse-failure path from Skipf to Fatalf — un-parseable YAML in tests/resources/ is a fixture-quality bug, not something to skip past. Kept Skipf only for the kind!=AP path, which correctly handles the legit non-AP fixtures (Deployments, Services etc.) that share the directory. 3. tamper_alert_test.go (Trivial nitpick, 'Low value') — added an integration-style test that exercises the full verifyUserApplicationProfile path: real signature.SignObjectDisableKeyless, tamper the spec, call verifyUserApplicationProfile, assert tamperEmitted carries the key. Then re-sign at a new RV and assert the dedup is cleared. Confirms the wiring 'verifier returns ErrSignatureMismatch → classify as tamper → LoadOrStore in dedup map' actually fires, not just the LoadOrStore call in isolation. Uses the real cosign adapter — no mocking needed. * fix(test): make dedup-clearing assertion non-trivial CodeRabbit nitpick on PR #38 (tamper_alert_test.go:159): the prior version of TestVerifyAP_TamperedProfile_PopulatesDedupMap bumped the profile's ResourceVersion to '43' before the re-sign, then asserted that the key for RV='43' was absent from tamperEmitted. But '43' was never added in the first place — only '42' was added during the tamper phase. The assertion was trivially true. Drop the RV bump: re-sign at the SAME RV='42' so that verifyUserApplicationProfile takes the verify-clean branch and Deletes the existing dedup entry. The assertion now actually exercises the clearing path. Production code unchanged. * fix(ci): drop 'permissions: read-all' from reusable workflows Chore PR #39 added 'permissions: read-all' to every workflow file including the workflow_call reusables. That broke the reusable-call contract: a called workflow's top-level permissions are the FLOOR the caller must grant. Callers of these reusables (pr-created.yaml's pr-created job, pr-merged.yaml's pr-merged job, etc.) only grant a narrow set of scopes (id-token:write, packages:write, security-events: write, pull-requests:write, contents). They do NOT grant the full read-all set (actions:read, artifact-metadata:read, attestations:read, checks:read, deployments:read, discussions:read, issues:read, models:read, vulnerability-alerts:read, packages:read, pages:read, repository-projects:read, statuses:read, id-token:read). Result: every PR opened on this fork since chore #39 merged (May 6) has had pr-created.yaml startup_failure with: Error calling workflow ... incluster-comp-pr-created.yaml@<sha>. The workflow is requesting 'actions: read, ...', but is only allowed [...] Fix: strip top-level 'permissions: read-all' from the four reusables: - benchmark.yaml (also direct-trigger; falls back to per-job perms) - go-basic-tests.yaml - incluster-comp-pr-created.yaml - incluster-comp-pr-merged.yaml Each reusable's per-job 'permissions:' blocks remain intact and correctly request only what the job needs. Callers' existing grants are sufficient. Top-level 'permissions: read-all' stays on the OUTERMOST workflows (pr-created.yaml, pr-merged.yaml, build.yaml, bypass.yaml, component-tests.yaml, sign-object.yaml, scorecard.yml) — they're event-triggered, not workflow_call'd, so the read-all default still hardens GITHUB_TOKEN there. --------- Co-authored-by: Entlein <eineintlein@gmail.com>

coderabbitai Bot requested changes May 4, 2026

View reviewed changes

Comment thread .github/workflows/benchmark.yaml

Comment thread .github/workflows/go-basic-tests.yaml

coderabbitai Bot approved these changes May 6, 2026

View reviewed changes

entlein merged commit 83bc87a into main May 6, 2026
1 check passed

entlein deleted the chore/token-permissions-least-privilege branch May 6, 2026 17:50

coderabbitai Bot mentioned this pull request May 9, 2026

Merge/exec arg wildcards #38

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: add explicit top-level permissions to all workflows (least privilege)#39

ci: add explicit top-level permissions to all workflows (least privilege)#39
entlein merged 2 commits intomainfrom
chore/token-permissions-least-privilege

entlein commented May 4, 2026

Uh oh!

coderabbitai Bot commented May 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

entlein commented May 4, 2026

Why

What

Coverage

Risk

Test plan

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 4, 2026 •

edited

Loading